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Abstract 



Anatomical shape differences in cortical structures in the brain can be associated with various neuropsychi- 
atry and neuro-developmental diseases or disorders. Labeled Cortical Distance Map (LCDM), a recently 
devised tool, can be a powerful tool to quantize such morphometric differences. In this article, we investi- 
gate various issues regarding the analysis of LCDM distances in relation to morphometry. The length of 
the LCDM distance vector provides the number of voxels (approximately a multiple of volume (in mm 3 )); 
median, mode, range, and variance of LCDM distances are all suggestive of size, thickness, and shape 
differences. Various statistical tests are employed to detect left-right morphometric asymmetry, group dif- 
ferences, and stochastic ordering (i.e., cdf differences) of these quantities. However these measures provide 
a crude summary based on LCDM distances which may convey much more information about the tissue in 
question. To utilize more of this information, we pool (merge) the LCDM distances from subjects in the 
same group or condition. We check for the similarity of the distributions of LCDM distances for subjects 
in the same group using the kernel density plots, and also investigate the influence of the outliers (i.e., 
subjects with extremely different LCDM distance distributions). The statistical methodology we employ 
require normality and within and between sample independence. We demonstrate that the violation of 
these assumptions have mild influence on the tests. We specify the types of alternatives the parametric 
and nonparametric tests are more sensitive for. We also show that the pooled LCDM distances provide 
powerful results for group differences in distribution, left-right morphometric asymmetry of the tissues, 
and variation of LCDM distances. As an illustrative example, we use gray matter (CM) tissue of ventral 
medial prefrontal cortices (VMPFCs) from subjects with major depressive disorder, subjects at high risk, 
and control subjects. We find significant evidence that VMPFCs of subjects with depressive disorders are 
different in shape compared to those of normal subjects. Although the methodology used here is applied 
on the LCDM distances of CM of VMPFC, it is also valid for morphometric measures of other organs or 
tissues and distances similar to LCDM distances. 



1 Introduction 



Quantification of morphometric properties of neocortical tissues is a major component of Computational 
Anatomy [1-21]. Our group recently developed the Labeled Cortical Distance Mapping (LCDM) techniques 
[22] which was shown to be useful in identifying cortical thinning in the cingulate cortex in subjects with 
Alzheimer's Disease [23] and in subjects with schizophrenia [24] in comparison to control subjects. 

Cortical thinning has been observed in other regions in a variety of neuro-developmental and neuro- 
degenerative disorders (see above references for examples). In particular, functional imaging studies implicate 
the ventral medial prefrontal cortex (VMPFC) in major depressive disorders (MDD) [25, 26] which have been 
correlated with shape changes observed in structural imaging studies [27, 28]. The prefrontal cortex, together 
with amygdala and hippocampus, plays an important role in modulating emotions and mood. Structural 
imaging studies in MDD have largely focused on adult onset with only few focused on early onset MDD 
which has been associated with structural deficits in the subgcnual prefrontal cortex, a subregion of the 
VMPFC [28]. Furthermore, the whole VMPFC has been examined in a twin study of early onset MDD [29]. 

Several studies of the VMPFC and related structures have been obtained from analysis of the cortex as a 
whole [4,17, 30, 31] whereas others have pursued a more localized analysis attempts to deal with the highly 
folded gray/white matter cortex [32]. In this way the laminar shape of the brain tissue can be quantified in 
great detail. Two aspects of the laminar shape are structural formation (like surface and form of the tissue) 
and scale or size (like volume and surface area). Throughout the article, we call all aspects of laminar shape 
as the morphometry of the tissue (including shape and size), the surface structure and form will be referred 
as "shape" and scale will be referred as "size" . 

The first step in creating LCDM metrics involves segmenting MPJ subvolumes of the tissue in question. 
Then every voxel is labeled by tissue type as gray matter (GM), white matter (WM), and cerebrospinal fluid 
(CSF). For every voxel in the image volume, its (normal) distance from the center of the voxel to the closest 
point on GM/WM surface is computed. A signed distance is used to indicate the location of each voxel with 
respect to the GM/WM surface; distances are positive for GM and CSF voxels, and negative for WM voxels. 
See Figure [1] for a schematic flowchart of the LCDM procedure and Figure for a two-dimensional illustration 
of LCDM distance calculation and non-normalized histograms of the (signed) distances for GM, WM, and 
CSF. 

As an illustrative example, we investigate GM tissue in VMPFCs in a study of early onset depression 
in twins. Previously, we analyzed various morphometric measures (i.e., volume, descriptive statistics based 
on LCDM distances such as median, mode, range, and variance) and demonstrated that except for left-right 
asymmetry and correlation between left and right measures, these variables usually failed to discriminate 
the healthy subjects from depressed ones [33]. One reason for this is the fact that the subjects in our data 
set are age-matched female twins, who potentially have VMPFCs similar in size. Moreover, this might be 
partly due to the size of groups; i.e., if we had more participants in the study, these measures would have 
been more likely to yield significant differences between the groups. On the other hand, by only using 
a descriptive summary statistic of numerous of distances for each person, we essentially lose most of the 
information provided by LCDM measures. In this article we provide a strategy to avoid this information loss 
by pooling the LCDM distances. We use the pooled (by condition or group) distances to detect morphometric 
differences such as differences in shape, size, thickness variation, and left-right asymmetry. However there 
is a downside to pooling, the pooled distances do not have within sample independence, as the distances of 
neighboring voxels for each voxel at a particular hemisphere of a person are dependent. Moreover, there is 
also dependence between distances from left and right VMPFCs of each subject. We demonstrate that within 
sample dependence does not affect the tests in terms of empirical significance levels (or Type I errors) or 
power; and left-right dependence only makes the asymmetry tests less powerful than they could have been. 

We describe the acquisition of LCDM distances for VMPFCs in Section 12.11 the methods we employ in 
Sections l2.21 describe the example data set in Section ^. 31 provide analysis of volumes and descriptive measures 
based on LCDMs in Section O outlier detection in Section HTTl analyze the pooled distances in Section 
and present the influence of assumption of violations in Section [5j In [33] we computed simple descriptive 
measures for each left and right VMPFCs of the subjects and analyzed these measures for group differences. 
These LCDM-based descriptive statistics are also analyzed in more detail in this article. For more technical 
detail on LCDM see, e.g., [36], where accuracy of LCDMs and variability of cortical mantle distance profiles 



are also discussed. In [36], LCDM is used for detecting differences in cingulate due to dementia of Alzheimer's 
type (DAT). The usual Welch's i-test is applied for volume comparisons, Wilcoxon rank sum test is applied 
for group comparisons on randomly selected subsamples from LCDM distances. In this article, rather than 
subsampling or summarizing, we use the entire LCDM distance set by pooling the distances for each group 
and investigate the validity of the underlying assumptions for the tests used. Since in this article we focus on 
the use of LCDM distances, rather than the clinical implications of the genetic influence (due to twinness), 
we ignore the twin influence for most of the current analysis. 



2 Methods 

2.1 Data Acquisition 

A cohort of 34 right-handed young female twin pairs between the ages of 15 and 24 years old were obtained 
from the Missouri Twin Registry and used to study cortical changes in the VMPFC associated with MDD. 
Both monozygotic and dizygotic twin pairs were included, of which 14 pairs were controls (Ctrl) and 20 pairs 
had one twin affected with MDD, their co-twins were designated as the High Risk (HR) group. 

Three high resolution Tl-weighted MPRAGE magnetic resonance scans of this population were acquired 
using a Siemens scanner with 1 mm 3 isotropic resolution. Images were then averaged, corrected for intensity 
inhomogencity and interpolated to 0.5 x 0.5 x 0.5 3 isotropic voxels. 

Following Ratnanather et al. [32], a region of interest (ROI) comprising the prefrontal cortex stripped of 
the basal ganglia, eyes, sinus, cavity, and temporal lobe was defined manually and segmented into gray matter 
(CM), white matter (WM), and cerebrospinal fluid (CSF) by Baycsian segmentation using the expectation 
maximization algorithm [34]. A triangulated representation of the cortex at the GM/WM boundary was 
generated using isosurface algorithms [35]. 

See Figure [T] for the schematic flowchart of LCDM measurement procedure and Figure [2] for an illustration 
of normal distances from GM and WM voxels to the interface in a two-dimensional setting and non-normalized 
histograms of LCDM distances for GM, WM, and CSF tissues of a cingulate tissue. The GM tissue comprises 
most of the cortex; and by construction, most of GM distances are positive, most of WM distances are 
negative, and all of CSF distances are positive. The mismatch of the signs for some GM and WM voxels close 
to the GM/WM boundary are due to the way the surface is constructed in relation to how the pixels are 
labeled, such that a surface is always intersecting pixels, and has to maintain a somewhat smooth shape. So 
some appropriately labeled GM and WM pixels may fall on a side of surface that they should not belong to; 
however, these mislabeled voxels constitute a small number of voxels and do not affect our overall analysis. Let 
V be the regular lattice of voxels defining the region of interest, 5 (A) be the triangulated graph representing 
the smooth boundary at the GM/WM surface. Then the distance computation algorithm is specified as 
[36-38]. 

for all Vj G V do 

Sdosest a point in 5(A) such that 

for all Sj e 5(A) do 

d(s closest, Vi) < d(Sj,Vi) 

end for 

Di <— d(s closest, Vi) 

end for 

where d(-,-) stands for the usual Euclidean distance, Vi is the i th voxel, Sj is the j th point in 5(A) and 
Di is the i th distance (i.e., distance for i th voxel). That is, an LCDM distance is a set distance function 
d : Vi E V — > d (centroid(wi), 5 (A)), which is the distance between the centroid (or center of mass) of Vi and 
the set 5(A). More precisely, 

Di := d (centroidfe), 5 (A)) = min ||centroid(vi) — s|L 

sSS(A) 

To distinguish the distances for voxels from different tissue types, we denote the distance for i th voxel in 
tissue type label as D { (label) for label £ {WM,GM,CSF}. 



Volume and GM distributions as a function of position from the GM/WM interface can be derived from 
LCDMs. The GM volume is simply the total number of GM voxels (times the volume of a single voxel). As 
variability of total GM volume can be a confounding factor in studying cortical thickness, normalizing LCDMs 
of each individual by its total GM volume generates Cortical Mantle Distance (CMD) profiles. Integrating 
the density function results in a distribution function that represents the percentage of GM as a function of 
distance from the cortical surface. 

2.2 Statistical Tests 

We use various morphometric measures of left (and right) VMPFCs in our analysis. LCDM distance measures 
are sufficient to determine the volumes (in mm 3 ) of VMPFCs. For each person, we also record the median, 
mode, range, and variance of LCDM distances. For left (and right) volume, median, mode, range, and variance 
comparisons between groups, we can not apply Kruskal-Wallis (K-W) test for equality of the distributions 
of these measures for all groups, because there is an inherent (genetic) dependence between MDD and HR 
groups, since cotwin of each MDD subject is by definition a HR subject. So we use Wilcoxon rank sum test 
to compare MDD and Ctrl VMPFC, and also HR and Ctrl VMPFC. On the other hand, we use Wilcoxon 
signed rank test for MDD and HR VMPFC, due to dependence of the samples. Then we adjust these p- values 
for simultaneous pairwise comparisons by Holm's correction method. See [45] for the tests and [39] for Holm's 
correction. We resort to the non-parametric tests only, when the assumptions of normality and homogeneity 
of the variances (HOV) are not met. When the assumptions arc met and only the parametric tests yield 
significant results, we use the parametric counter-parts (Welch's t-test for independent samples and paired 
t-test). 

For morphometric asymmetry of left and right VMPFCs, we compare these measures between left and 
right VMPFCs (overall and for each group) by Wilcoxon signed rank test which is a paired (for differences of 
measures) test (see, e.g., [40]). 

We perform correlation analysis between left and right morphometric measures using Spearman's rank cor- 
relation coefficient and the corresponding test against the correlation coefficient being nonzero. We also com- 
pare cdfs (cumulative distribution functions) between groups for left (and right) VMPFCs using Kolmogorov- 
Smirnov (K-S) test (see, e.g., [40]). 

When we pool (i.e., merge) the LCDM distances by group, there is an inherent dependence between 
LCDM distances due to the spatial correlation between neighboring voxels of a left or right VMPFC. For the 
parametric tests (AN OVA F-test and t-test), the assumptions of normality and within sample independence 
are violated, while for nonparametric tests (K-W test and Wilcoxon tests) only within sample independence is 
violated. Since more assumptions are violated for the parametric tests, the nonparametric tests are expected 
to be more appropriate in our analysis. However, since the correlation structure is similar for each person 
(hence for each group), its influence on both parametric and nonparametric tests is negligible. See Section O 
for a Monte Carlo simulation study to justify the use of these tests for such data structures. We use Kruskal- 
Wallis (K-W) test for equality of the distributions of the pooled distances for all left groups and ANOVA 
-F-tests (with and without HOV) for equality of the means for all left groups; if K-W test yields a significant 
p-value, then we use pairwise Wilcoxon rank sum test to compare the pairs of the groups; similarly, if one of 
the ANOVA i^-tests is significant, then we use pairwise t-test to compare the pairs of the groups. Then we 
adjust these p- values for simultaneous pairwise comparisons by Holm's correction method [39]. We perform 
similar analysis for right groups. For morphometric asymmetry of left and right VMPFCs, we compare these 
measures between left and right VMPFCs (overall and for each group) by Wilcoxon rank sum test (see, e.g., 
[40]) and Welch's t-test. Although there is an inherent dependence on the MDD and HR VMPFC or left 
and right distances, we do not use Wilcoxon signed rank test (for dependent samples) or matched pair t-test, 
because the distances for the left and right VMPFCs can not be matched (paired). For the same reason, 
we cannot perform correlation analysis on these groups. We also compare cdfs between groups for left (and 
right) VMPFCs using Kolmogorov-Smirnov (K-S) test (see, e.g., [40]). 

The test of equality or homogeneity of the variances (HOV) of pooled distances is also important. Be- 
cause, variance differences between groups might be indicative of differences between the variations of the 
morphometry of VMPFCs. Therefore, we perform HOV test by using Levene's test with absolute dispersion 
around the median, which is also known as Brown- Forsythe's (B-F) HOV test (see, e.g., [41]). 



2.3 Example Data Set 



As an illustrative example, we use GM of left and right VMPFCs. The prefrontal cortex, together with the 
amygdala and hippocampus plays an important role in modulating emotions and mood. For the location 
of VMPFC in brain, see Figure [31 Abnormalities have been demonstrated in structure and function of the 
prefrontal cortex due to depression [25,26]. Previous structural imaging studies on Major Depressive Disorder 
(MDD) have largely focused on adult onset, while only few have focused on early onset MDD. Botteron et 
al. [44] have conducted structural imaging studies on early onset of MDD in the ventral medial prefrontal 
cortex (VMPFC) region of twins. Structural deficits in the subgenual prefrontal cortex have been shown to 
implicate early onset of MDD [29] . 

For convenience in notation, we suppress the label argument in the distances, as we only consider the GM 
tissue. Let D L be the set of LCDM distances, D^ k , which is the distance associated with k th voxel in GM of 
left VMPFC of subject j in group i for j = 1, . . ., m and i = 1, 2, 3 (group 1 is for MDD, group 2 for HR, and 
group 3 for Ctrl). Right VMPFC distances D R are denoted similarly &sDf jk . The LCDM distances for GM 
in left and right VMPFCs, D L and D R , are plotted by subject in Figure 0J The automated segmentation is 
more reliable for the GM close to the GM/WM surface due to the high level of contrast. However, there are 
still voxels which, although labeled appropriately, have the incorrect sign, that is, some of D^- k and D R k are 
negative for each subject. Large distances are potentially less reliable, due to the diminishing contrast around 
the boundary of GM and CSF compartments. Thus, based on Figure |4] and prior anatomical knowledge on 
VMPFCs (e.g., [42]), we only keep distances larger than —0.5 mm so that mislabeled WM is excluded from 
the data, and the upper limit is set to 5.5 mm, so that the error due to less reliable large distances is reduced. 
Observe that in the left VMPFC distances, MDD subjects 8 and 11, HR subjects 3 and 18, and Ctrl subjects 
6, 10, 11, 12, and 22 seem to be more different with Ctrl subjects 11 and 12 being "thinner" while the rest 
are "thicker" than other left VMPFCs. On the other hand, in the right VMPFC distances, VMPFC of MDD 
subject 4, HR subject 5, and Ctrl subjects 12 and 21 seem to be more different, with VMPFCs of HR subject 
5 and Ctrl subject 12 being thinner and the others being thicker than the rest of the right VMPFCs. Note 
that Figure [4] provides a preliminary assessment of reliability of LCDM distances, since it does not provide 
the distributional behavior of the distances, but only problems with small (negative) and large distances. As 
a technical aside, we note that only 0.16 % of left distances and 0.14 % of right distances are below -0.5 mm; 
on the other hand, only 0.22 % of left distances and 0.07 % of right distances are above 5.5mm. 

3 Analysis of Volumes and Descriptive Measures of LCDM Dis- 
tances of VMPFCs 

Volume is a measure of size of VMPFCs. Let V% be the volume of left VMPFC of subject j in group i, for 
j = 1, . . . , m and i = 1,2, 3. Right VMPFC volumes are denoted similarly as V R . For each person, number 
of LCDM distances recorded yield the number of voxels, which in turn yields a multiple of the volumes in 
mm 3 , since each voxel is a cube of size 0.5 x 0.5 x 0.5 mm 3 . That is, 

V# = 0.125 x£l(D£. fc e [-.5,5.5]) 

k 

where I(-) stands for the indicator function. Right VMPFC volumes can be obtained similarly. 
3.1 Analysis of Volumes of VMPFCs 

We analyze volumes of VMPFCs for various purposes: (i) to provide an outline of the methodology we will 
employ for other quantities in this article, (ii) to compare the volume results with other comparisons, and 
(iii) to check the volume (size or scale) differences due to groups. We note the group of each volume value; for 
example, if a volume is of a VMPFC of a person in group MDD, then the corresponding group is MDD. See 
Figure [5] for the (jittered) scatter plot of the volumes by group for left and right VMPFCs. See also Table Q] 
for the sample sizes, means, and standard deviations of the volumes, overall and for each group, for left and 
right VMPFCs. Observe that left VMPFC volumes are larger than the right VMPFC volumes. Moreover 



the mean volume measures for the left VMPFC seem to be more different between groups compared to right 
VMPFCs. 

To find which pairs, if any, manifest significant differences, we perform pairwise comparisons by Wilcoxon 
rank sum tests for MDD, Ctrl and HR,Ctrl pairs and Wilcoxon signed rank test for MDD.HR pair for left 
VMPFC volumes using Holm's correction. The p-values for the pairwise tests for left and right VMPFC 
volumes are presented in Table O where pw stands for p- value for Wilcoxon rank sum test, pt stands for 
p- value for Welch's t-test, significant results are marked with an asterisk (*) and (£) stands for the alternative 
that first group volumes tend to be less than second group volumes and (<?) stands for the alternative that first 
group volumes tend to be greater than second group volumes. More precisely, given two groups of random 
variables X and Y, (£) alternative implies Fx(x) > Fy(x) where Fx and Fy are the distribution functions 
for X and Y, respectively, or ELY] < E[Y] where E[-] stands for expectation or P(X > Y) < P(X < Y). 
Notice that all of these various forms of alternatives convey the idea that "X tends to be smaller than Y" in 
some way [40]. Observe that none of the pairs manifest significant differences in distribution of volumes. For 
the i-test, (£) stands for the alternative that the first group mean volume is less than the second group mean, 
and (g) stands for the alternative that the first group mean is greater than the second group mean. None of 
the pairs indicate significant differences in mean volumes. 

Since MDD,HR volumes are dependent, we only test for the homogeneity or equality of the variances of 
volumes of MDD,Ctrl and HR,Ctrl pairs. We observe that HOV of MDD and Ctrl left VMPFC volumes is 
not rejected (p = .36f0), and likewise for HR and Ctrl left VMPFC volumes (p = .3202). Similarly, HOV of 
MDD and Ctrl right VMPFC volumes is not rejected (p = .1038), and likewise for HR and Ctrl right VMPFC 
volumes (p = .1038). This suggests that the spread or variation in volumes of left (and right) VMPFC within 
groups is not significantly different from each other for the groups considered. 

We also test for the differences between left and right VMPFC volumes, i.e., left-right volumetric asym- 
metry. For the associated p- values, see Table El where pw stands for the p- value for Wilcoxon signed rank 
test, and pt stands for paired i-test. For testing overall left-right asymmetry, we pool all the left volumes into 
one set, and all the right volumes into another. Observe that left VMPFC volumes are significantly larger 
than right VMPFC volumes (p w = -0064 and p t = .0087). Among the groups, only MDD VMPFC shows 
significant volumetric left-right asymmetry with pw = .0360 and p t — .0233. That is, the depressed subjects 
tend to have more left-right volumetric asymmetry compared to HR and Ctrl subjects, in such a way that 
left volumes tend to be significantly larger than the right volumes in MDD subjects. 

Spearman's rank correlation coefficients, denoted pg, and the associated p- values between the left and 
right scales are given in Table 0] where MDDL refers to volumes of left VMPFC of MDD patients, MDDR, 
HRL, and HRR are defined similarly. Observe that there is significant (positive) correlation between the left 
and right volumes for overall, MDD, HR, and Ctrl VMPFCs. Correlation implies that, for example, when left 
volumes increase or decrease, so do the right volumes. On the other hand, MDD,HR left volumes are mildly 
correlated, while MDD,HR right volumes are not significantly correlated. 

We also compare the cumulative distribution function of the volumes by group, which may also provide 
a stochastic ordering, for MDD, Ctrl and HR,Ctrl pairs only, since the MDD,HR pairs are dependent. See 
Table [5] for the associated p- values. Observe that none of the p- values is significant. 

Notice that except for left-right asymmetry and correlation between left and right volumes (for each group), 
none of the comparisons is significant at .05 level. But this does not necessarily imply lack of VMPFC shape 
differences between groups, as volume only measures an aspect of size. Next, we analyze various descriptive 
measures (summary statistics) based on CM LCDM distances. The lack of significant group differences in 
volumes might be due to the fact that the data consists of age-matched female subjects, whose VMPFCs 
might be very similar in size. Furthermore, if the number of subjects per group is increased, then it is more 
likely to see significant group differences, if they exist. 

3.2 Analysis of Descriptive Measures of LCDM Distances of GM of VMPFCs 

In this section we analyze some other measures which are more directly associated with LCDM distances. 
We find the descriptive measures (summary statistics) of the LCDM distances for each person. Among the 
descriptive statistics we analyze are the median, mode, range, and variance of the LCDM distances. We 



conduct the tests that we used for volumes in Section |3~T1 on these descriptive measures. 

Note that each of these descriptive measures conveys information about some aspect of morphometry 
(shape and size) of VMPFCs. We provide these analysis to demonstrate how LCDM distances can be used 
as a simple comparative tool. 

The median of LCDM distances for VMPFCs yields a central distance measure, or distance from "center" 
of VMPFC GM to the GM/WM interface. The median distance for left and right VMPFC gray matter for 
subject j in group i are denoted as med (D^) and med (D^) , respectively. We use the median distance rather 
than the mean distance here, because LCDM distances are skewed right, so median is a better measure of 
centrality as it is more robust to extreme values compared to the mean. See Figure 0] for right skewness of 
distances in the scatterplot and Figure [H] for the kernel density plots for LCDM distances for left and right 
VMPFC by subject. The tests indicate that there is no group differences in the distributions of the median 
distances for both left and right VMPFCs, no significant left-right asymmetry, and no significant difference 
between the cdfs of median distances of groups for both left and right VMPFCs. HOV is rejected for HR 
and Ctrl left median distances with p = .0261 only. Furthermore, MDD,HR, and Ctrl-left, Ctrl-right median 
distances are significantly positively correlated, while MDD,HR left (and right) median distances are not. 

The mode of a data set as a descriptive statistic is the most frequent observation in the data set. To make 
it more meaningful for our data, we rounded the distances to 1 decimal place. Hence mode corresponds to the 
tenth of a millimeter that contains the most number of GM voxel distances. For instance, if mode of a subject 
is 2.2 mm the most number of distances are in the interval [2.2 mm, 2.3 mm] compared to other intervals. 
More precisely, the mode of LCDM distances for VMPFC yields the distance from the "widest" strip parallel 
to the GM-WM interface. The mode of distances for left and right VMPFC gray matter for subject j in 
group i are denoted as mode (-Dfj) and mode (Dfy, respectively. The tests indicate that there is no group 
differences in mode of the distances for both left and right VMPFC, no significant left-right asymmetry, no 
significant variance difference of mode of distances between groups, and no significant difference between the 
cdfs of the modes of the distances between groups for both left and right VMPFCs. There is mild positive 
correlation between Ctrl-left, Ctrl-right modes, and MDD,HR right modes. 

The range of LCDM distances for VMPFCs yields a rough measure of "thickness" of GM of VMPFC. 
We use the range (maximum LCDM distance minus minimum LCDM distance) rather than the maximum 
distance here, although conceivably the latter is also a reasonable choice. The range of distances for left and 
right VMPFC gray matter for subject j in group i are denoted as range {Df^ and range (D^, respectively. 
The tests indicate that there is no group differences in range of distances (thickness) for both left and right 
VMPFCs, no significant differences between the cdfs of groups for both left and right VMPFCs, and HOV 
of ranges of groups is not rejected. Left distance ranges are significantly larger than right distance ranges, 
overall and for each group. Furthermore, there is mild positive correlation between Ctrl-left, Ctrl-right ranges, 
and MDD,HR right ranges. 

The variance of LCDM distances for VMPFC yields a measure of "variation" of size of GM in VMPFC. 
We use the variances, rather than standard deviations here, since both yield the same results under rank based 
non-parametric tests. The variance of distances for left and right VMPFC gray matter for subject j in group 
i are denoted as Var (-Dfj) and Var (-D^), respectively. The tests indicate that there is no group difference 
in variance of distances (size variation) for both left and right VMPFC, and no significant difference between 
the cdfs of variances of distances between groups for both left and right VMPFCs, and HOV of variances of 
groups is not rejected. There is significant left-right asymmetry in variance of distances, with left variances 
significantly larger than right variances, overall and for each group. Furthermore, there is mild correlation 
between MDD left and right variances. 

Although descriptive statistics of LCDM distances measure some morphometric aspect of VMPFCs, they 
usually fail to discriminate the healthy VMPFCs from depressed VMPFCs. One reason for this is the fact 
that the subjects in our data set are age-matched female twins, who potentially have similar size VMPFCs. 
Moreover, this might be partly due to the size of groups; i.e., if we had more participants in the study, these 
measures are more likely to yield significant differences between the groups. On the other hand, by only 
using a descriptive summary statistic of thousands of distances for each person, we essentially lose most of 
the information LCDM measures convey. To avoid this over-summarization, we will use all the distances in 
our analysis in the next section. 

One could also use other descriptive measures such as inter-quartile range (IQR), skewness, and kurtosis 



of the LCDM distances. 



4 Pooling LCDM Distances 

Since the descriptive measures of LCDM distances are summary statistics, they tend to oversimplify the 
data[33], and hence we lose most of the information conveyed by the LCDM distances. To avoid this informa- 
tion loss, we pool LCDM distances of subjects from the same group or condition; that is, we pool the LCDM 
distances of all left MDD VMPFCs in one group, those of all left HR VMPFCs in one group, and those of all 
left Ctrl VMPFCs in one group. That is, 

71% 

3=1 

where Df t is the £ th distance value for distances from subjects in group i. We pool the right VMPFC LCDM 
distances in a similar fashion and denote the right pooled distances as Dfj;. 

One of the underlying assumptions is that the distances from VMPFC of subjects with MDD have the 
same distribution, those of HR have the same distribution, and so do those of Ctrl group. In other words, we 
assume that D^ k are identically distributed for all j= 1,. . . ,n, and i = 1,2, 3. So, D^- k ~ F^ for all j, k, and 
likewise Df- k ~ JpF 1 for all j, k. Hence the pooled distances are distributed as Df t ~ and ~ Ff 1 
for i = 1,2, 3. We take this action under the presumption that the morphometry of VMPFCs arc affected by 
the condition in a similar way and hence age and gender matched subjects with the same condition should 
have VMPFCs similar in morphometry. As a precautionary step, we find the extreme (outlier) subjects; i.e., 
the subjects whose VMPFCs have much different distributions than the rest in Section [4~T1 by (subjectively) 
comparing the kernel density estimates. 

4.1 Outlier Detection by Using Kernel Density Plots 

When pooling the distances for subjects at a particular group, we implicitly assume that the distances 
for subjects in the same group have identical distributions. As a precautionary step, we find the extreme 
(outlier) subjects; i.e., the subjects with VMPFCs having much different distributions than the rest. In 
Figure [H observe that in the left VMPFC distances, MDD subjects 8 and 11, HR subjects 3 and 18, and 
Ctrl subjects 6, 10, 11, 12, and 22 seem to be more different with Ctrl subjects 11 and 12 being "thinner" 
while the rest are "thicker" than other left VMPFCs. On the other hand, in the right VMPFC distances, 
VMPFC of MDD subject 4, HR subject 5, and Ctrl subjects 12 and 21 seem to be more different, with 
VMPFCs of HR subject 5 and Ctrl subject 12 being thinner and the others being thicker than the rest of the 
right VMPFCs. Note that Figure [4] provides only a preliminary assessment of reliability of LCDM distances, 
since it does not provide the distributional behavior of the distances, but only points the problems with small 
(negative) and large (positive) distances. Hence the kernel density estimates (or normalized histograms) can 
serve as a better exploratory tool to detect outliers. Sec Figure [5] for the kernel density estimates of LCDM 
distances plotted by subject. Notice that these kernel density estimates are normalized for volume, as each 
density curve has the same unit area under it. Recall also that Figure [4] provides some insight on kurtosis 
(the thickness of left and right tails) of the distributions of LCDM distances. Using both Figures [4] and [6l we 
observe that LCDM distances for some subjects have very different distributions than the others; i.e., they 
arc outliers. However, although Figure U provides information about the tails (i.e., small or large distances), 
Figure [5] is more reliable to detect the outliers as it is normalized for volume and provides information for 
all distance values. The VMPFC of outlier subjects are extremely different in shape from the remaining 
subjects in each group. Hence an outlier VMPFC in a group does not represent an average VMPFC in that 
group and this discrepancy (extremeness) might be due to some other factor affecting that subject only. A 
careful investigation shows that, among the left VMPFCs, MDD subjects 1 and 9, HR subjects 3 and 12, 
and Ctrl subjects 6, 10, 11, 12, and 19 are outliers, while among the right VMPFCs, MDD subjects 7, 11, 
13, 15, 18, 19, and 20, HR subjects 4, 8, 14, and Ctrl subjects 4, 10, 11, 12, 25, and 26 seem to be the 
outliers. Therefore, we remove these subjects in our pooled distance data sets, but perform analysis on both 
the pooled distances with all subjects included and the pooled distances without the outliers and remark on 
how the outliers influence the comparisons. Notice that HR left subject 3, Ctrl left subjects 6, 10, 11, 12, 



and Ctrl right subject 12 are the subjects deemed as outliers by both of Figures 2] and [51 Observe also that 
LCDM distances (more precisely the normalized histograms or kernel density estimates) can be used as an 
exploratory tool to detect the outliers. 

Here it may seem a little excessive to use the term "outlier" as the number of subjects that are defined as 
outliers seem to be numerous. Approximately 15% of left-VMPFCs arc defined as outliers, while over 20% of 
right- VMPFCs are classified as outliers. This might seem too high to treat these subjects as outliers, thereby 
suggesting that some form of mixed distribution modeling may be more appropriate. Although keeping the 
outliers changes the results considerably compared to results from deleting the outliers, for the methods based 
on pooling the distances, we recommend removing the outliers, perhaps in a more conservative manner than 
ours, because the basic premise of pooling is based on the similarity of the distance distributions for VMPFCs 
of the same group. The issue of modeling the distances with mixed distributions is a topic of ongoing research. 

See Figure [7] for the kernel density estimates of pooled LCDM distances when the outliers are removed. 
See also Table |6j for the corresponding sample sizes, means, and standard deviations of the pooled LCDM 
distances, overall and for each group. Observe that the density profiles of LCDM distances for the left VMPFC 
of MDD and HR subjects seem to be very similar while both are different from that of Ctrl subjects. On 
the other hand, there seems to be more separation between the density profiles in right VMPFCs. After 
removing the extreme subjects, the sample sizes of LCDM distances have decreased while the medians and 
standard deviations (hence the variances) for all left and right groups have increased. Furthermore, mean 
LCDM distances for left VMPFC got smaller, while for right VMPFC the mean distances have increased. 
Furthermore, the order of mean distances for left and right VMPFCs do not change with all the subjects 
included and when outliers removed. For left VMPFCs, the order of mean and standard deviations are HR 
< MDD < Ctrl (more accurate notation would be mean^D^) <mean{D I ( ) <mean[D^\ , which we shorten 
for convenience), while the order of medians is MDD < HR < Ctrl. The change in the order of means and 
medians is due to the levels of right skewness of the distributions. For right VMPFCs, the order of means, 
medians, and standard deviations are HR < MDD < Ctrl. Thus, we observe that the outliers in the right 
VMPFC, although influence the means, medians, and variances, do not change their order. 

4.2 Results 

4.2.1 The Equality of the Distributions of Pooled LCDM Distances 

First wc address the differences in the distributions in location but not in spread. The differences in the 
distributions in the location (e.g., means or medians) of LCDM distances imply shape differences. Hence, we 
test the equality of the distributions of the left (pooled) distances between groups; i.e., 

H ■ F L — F L — F L 

where is the distribution function of the pooled distances for group i = 1,2,3. Likewise for right pooled 
distances. 

The left and right pooled distances for each group are significantly non-normal with p^ < .0001 for each 
test where pi, is the p- value for Lilliefor's test of normality (see, e.g., [43]), possibly due to heavy right skewness 
of the densities. Moreover, HOV is rejected with pbf < -0001 for both left and right pooled distances where 
Pbf is the p- value for B-F test. Hence non-parametric tests of group comparisons are more appropriate for 
this data. Note that the above hypothesis of equality of the distributions of the pooled distances can be 
attributable to the similarity in the VMPFC shapes for all groups, but not vice versa (i.e., the equality of 
the distributions does not necessarily imply morphometric similarity, but similarity in the distance structure 
of GM tissue with respect to the GM/WM surface.) Notice that LCDM distances analyzed in this fashion 
provide morphometric information, on cortical mantle thickness and shape, but the width (the length of 
VMPFC parallel to the GM/WM surface) is less relevant. Because the comparison is done on the ranking of 
distances with respect to the GM/WM surface. For example, suppose two VMPFC tissues are composed of 
100 and 1000 voxels of similar distances and then the test will detect no difference, although the morphometry 
is obviously different. Hence, as long as the voxels are at a similar distance from the GM/WM surface, their 
abundance will not be influencing the test results. 

The equality of the distributions of the distances of left VMPFCs is rejected with pkw < -0001 where 



Pxwis the p-value for K-W test, and likewise for right VMPFC distances (pkw < -0001). Without removing 
the extreme subjects (i.e., when all subjects are included), we have the same conclusions for right and left 
VMPFCs with pkw < -0001 for both. Hence, we perform pairwise comparisons by Wilcoxon rank sum test for 
left (and right) VMPFC distances, using Holm's correction for multiple comparisons. The p-values adjusted 
by Holm's correction method for the simultaneous pairwise comparisons for left and right VMPFC distances 
are presented in Table [7] Observe that, with all subjects included and when the outliers are removed, MDD- 
left and HR-lcft distances are not significantly different (pw = .3022 for former, pw = .0776 for latter where 
Pw is the p- value for Wilcoxon rank sum test), while both tend to be significantly less than Ctrl-left distances 
(pw < -0001 for all). Hence, the VMPFC left distances tend to decrease due to the depressive disorders, 
possibly due to a thinning in left VMPFCs. In right VMPFC distances, with all the subjects included, we 
observe that MDD and HR-right distances are not significantly different from each other, while both tend to be 
significantly smaller than Ctrl-right distances (pw < .0001 for both). When outliers are removed, we observe 
that MDD-right distances tend to be significantly smaller than HR-right distances (pw = .0084) which tend 
to be significantly smaller than Ctrl-right distances (pw < .0001 for both). Observe that outliers (although 
do not change the order of mean pooled distances) do influence the results, in particular for MDD and HR 
groups. Looking at the kernel density estimates in Figure O we see that the outliers in the HR group arc 
more similar to MDD group, hence making the MDD and HR distance distributions more similar than chance. 
Recall that we were not able to detect these differences by using volume, or simple descriptive measures based 
on LCDM [33]. Thus, the pooled LCDM distances provide comparisons that are more powerful to detect 
group differences. 

Since the densities of the distances are skewed right, these differences do not reflect the order in the mean 
distances, but rather the order in the median distances. Furthermore, in these analysis we ignore the influence 
of possible dependence between twin pairs due to genetic similarity. 

4.2.2 Homogeneity of the Variances (HOV) of Pooled LCDM Distances 

Observe that K-W and Wilcoxon tests suggest shape differences when rejected, in particular the direction 
of the alternatives might indicate cortical thinning. Similarity of the morphometry of VMPFCs will cause 
similarity of LCDM distances, which in turn implies similarity of the variances of LCDM distances. Variance 
of distances is suggestive of morphomctric variation in VMPFCs. So similar shapes and sizes imply similar 
variances, but not vice versa. For example, cortical thinning will reduce the variation in LCDM distances; 
and the larger the spread in the boundary (surface) of VMPFC, the larger the variance of LCDM distances. 
Hence, we test the equality of the variances of the left (pooled) distances between groups; i.e., 

H : Var (£>&) = Var (D^) = Var (D&) 

where is the variance of the pooled distances for group i = 1,2,3. Likewise for right distances. With all 
the subjects included and when extreme subjects are removed, HOV is rejected with pbf < -0001 for both left 
and right VMPFC. Sec Table[S]for the corresponding p- values for simultaneous pairwise comparisons adjusted 
by Holm's correction method. With all the subjects included and when the extreme subjects are removed, 
the order of the variances is HR < MDD < Ctrl for both left and right VMPFCs with pbf < .0001 for all six 
possible comparisons. This implies that the morphomctric variation reduces in left and right VMPFCs due 
to suffering from or being at high risk for depressive disorders compared to Ctrl subjects and is smallest for 
the HR subjects for both left and right VMPFCs. Observe that both Wilcoxon rank sum tests (for location) 
and B-F tests (for variances) yield significant results with the same ordering between groups (HR < MDD < 
Ctrl), which might be due to cortical thinning among other factors. 

4.2.3 Morphometric Left-Right Asymmetry with Pooled Distances 

LCDM distances can also be used to detect left-right morphometric asymmetry, which might be due to shape 
or size asymmetry between left and right VMPFCs. If the left and right VMPFCs are similar, then the 
distributions of the left and right VMPFCs will be similar, but not vice versa. But, if the distributions of the 
left and right VMPFCs are different, then there is evidence for morphometric left-right asymmetry, which can 
also be detected by the use of LCDM distances (with Wilcoxon rank sum test). In terms of size asymmetry, 
LCDM emphasizes mantle thickness asymmetry, rather than the mantle width asymmetry. 



Hence we test 

for each i = 1, 2, 3. See Table H] for the associated p- values, which are adjusted by Holm's correction method. 
Observe that when all the subjects are included, left distances are significantly larger than right distances, 
overall, and by group with pw < .0001 for each test. When extreme subjects (outliers) are removed, Ctrl 
and MDD VMPFC distances yield significant left-right asymmetry, with left distances being significantly 
smaller than right distances (pw < -0001 for both); and HR-left distances are significantly larger than HR- 
right distances (pw = -0015). Hence, we conclude that there is significant left-right asymmetry in LCDM 
distances. However, the direction of left-right asymmetry is different for MDD and HR subjects, while it is 
same for MDD and Ctrl subjects. This suggests that cortical mantle in left VMPFC is thinner for MDD and 
Ctrl subjects and thicker for HR subjects compared to their right counterparts. Notice that the inclusion of 
outliers (i.e., when all subjects are included) influences MDD and Ctrl groups to the extent that the direction 
of the asymmetry is reversed. 

4.2.4 Stochastic Ordering of Pooled LCDM Distances 

Recall that we used Wilcoxon tests to test the null hypothesis of equality of the distributions, i.e., H : Fx = 
Fy where Fx and Fy are the distributions of variables X and Y, respectively. For one-sided alternatives, the 
p- values based on Wilcoxon test are complementary (i.e., the p- value for "<" and ">" alternatives add up to 
1). Hence p-value will be significant for only one type of directional alternative. Furthermore, when rejected, 
Wilcoxon test implies an ordering in location parameter such as mean or median. Stochastic ordering, if 
present, can be deduced from the direction of the alternative, together with cdf plots. Sec Figure [8] for the 
cdf plots of the LCDM distances for each subject and Figure [9] for the cdf plots of the pooled distances. 
Observe that the cdf plots for the pooled distances are not suggestive of the stochastic ordering with the 
current resolution. We can also use Kolmogorov-Smirnov (K-S) tests for H a : Fx = Fy- Unlike Wilcoxon 
tests, K-S test yields p- values that are not complementary for the one-sided alternatives (i.e., they don't add 
up to 1). Hence, p- value can be significant for both or none of the directional alternatives. This results from 
the fact that the order of the cdfs Fx and Fy can be different at different distance values on the horizontal 
axis. Moreover, if p-value based on K-S test is significant for only one-sided alternative, then we can deduce 
stochastic ordering. The p-values being insignificant or significant for both one-sided alternatives imply lack 
of stochastic ordering. But, first case implies that equality of the distributions is retained, while the latter 
implies significant differences in the distributions. So, we also apply K-S test on our data set to compare the 
cumulative distribution functions of the distances by group, which might also provide a stochastic ordering. 
The null hypotheses are 

H :F^ = Ff 

for each € {(1, 2), (1, 3), (2, 3)}. See Table [TU1 for the associated p- values where tests for each alternative 
are adjusted by Holm's correction method. Observe that, with all the subjects included, the cdf of MDD-left 
distances is significantly smaller than those of Ctrl and HR-left distances. Furthermore, the cdfs of MDD 
and HR-left distances are significantly different from each other, with both sides being significant, which 
suggests that the order of cdf comparisons change at different distance values. When the extreme subjects 
are removed, the cdf of Ctrl-left VMPFC distances is significantly larger than those of MDD-left and HR-left 
distances, and the cdf of MDD-left distances is significantly smaller than that of HR-left distances. Thus, 
we conclude that the stochastic ordering of left distances is HR < ST MDD < s " r Ctrl; i.e., it is more likely for 
HR-left distances to be smaller compared to MDD-left and Ctrl-left distances, and more likely for MDD-left 
distances to be smaller than Ctrl-left distances. In other words, it is more likely for left VMPFCs of HR 
subjects to be thinner than those of MDD subjects, which are more likely to be thinner than those of Ctrl 
subjects. 

With all subjects included, the cdf of MDD-right distances is significantly smaller than HR-right distances. 
But K-S test yields significant result for both types of one-sided alternative for MDD, Ctrl and HR,Ctrl pairs. 
This implies, for example, the cdfs of MDD and Ctrl-right distances are different, hence no stochastic ordering 
between them. Furthermore, the differences between the cdfs of the groups change over the distance values; 
that is, for small distance values, the order is Ctrl<MDD<HR, while for large distance values the order is 
HR<MDD<Ctrl. When extreme subjects are removed, the cdfs have the following order: Ctrl<MDD<HR. 
This implies the stochastic ordering as HR < ST MDD < ST Ctrl; i.e., it is more likely for HR-right distances 
to be smaller compared to MDD and Ctrl-right distances, and more likely for MDD-right distances to be 



smaller than Ctrl-right distances. That is, it is more likely for right VMPFCs of HR subjects to be thinner 
than those of MDD subjects, which are more likely to be thinner than those of Ctrl subjects. Thus, applying 
K-S test on LCDM distances may provide the stochastic ordering of or LCDM distances or lack of it. 

5 The Influence of Assumption Violations on the Tests: A Monte 
Carlo Analysis 

5.1 The Underlying Assumptions for the Tests 

In our analysis, we have used various (parametric and nonparametric) tests, without addressing the validity 
of underlying assumptions. Let Xi = {Xn, . . . , A"i jTli } be m samples each of size n-i from their respective 
populations. Then, the assumptions for the K-W test for distributional equality of several independent 
samples are as follows [40]: 

1. All samples, Xi, are random sets from their respective populations; i.e., there is independence within 
each sample. 

2. There is mutual independence among various samples. For example Xi and Xj are independent for all 
possible combinations of 

3. The measurement scale is at least ordinal. 

4. Under the null hypothesis, the population distributions are identical. That is, Xij ~ F for all i = 
1, . . . ,m. 

The assumptions for Wilcoxon rank sum test are same; only we have two samples. For K-S test for cdf 
comparisons, the first three assumptions are same, but assumption 4 is: 

4. For K-S test to be exact, the random variables are assumed to be continuous. 

For discrete random variables, K-S test is still valid, but becomes a little conservative [40]. 

B-F test for HOV is the regular one-way ANOVA test on the absolute deviations from sample medians. 
That is, the usual ANOVA test is applied to samples Xl aed = {\Xi — med(Xi)\} for i = 1, . . . , m. Hence the 
assumptions for B-F test are the assumptions for ANOVA F-test on the absolute deviations from medians. 
Therefore the assumptions for B-F test are: 



1. All samples of absolute deviations from medians, X™ ed , are random from their respective populations; 
i.e., there is independence within each sample of deviations. 

2. There is mutual independence among various samples of deviations. For example X™ ed and X™^ 1 are 
independent for all possible combinations of 

3. The measurement scale is at least interval and the deviations are normally distributed. 

4. The homogeneity of the variances of the deviations; i.e., the variances of the deviations for each sample 
are identical. 



[46] have shown that B-F test gives quite accurate error rates even when assumption 3 is violated. However, 
the robustness of B-F test against assumption 4 is not clear, since this test is for HOV and it depends on the 
HOV of the absolute deviations [47] . 

The assumptions for Wilcoxon signed rank test are as follows [40]: 



1. The distribution of each paired difference is symmetric. 



2. The paired differences are mutually independent. 

3. The measurement scale is at least interval. 

4. The paired differences have the same mean, which is usually zero. 

Note that these assumptions are reasonably valid for the morphometric measures like volume, surface 
area, median, mode, range, and variance of the LCDM distances. Hence, we can safely use the above tests for 
these measures, except for the possible dependence between MDD and HR twins. However, pooled LCDM 
distances have spatial dependence (or correlation) hence independence within each sample does not hold, 
although the other assumptions for K-W, Wilcoxon rank sum, B-F, and K-S tests are reasonably valid. In the 
next section, we investigate the influence of assumption violations on the results by Monte Carlo simulations. 

5.2 Simulation of Data that Resemble LCDM Distances 

In this section, we investigate the influence of the assumption violations due to the spatial correlation and non- 
normality inherent in the LCDM distance measures on the tests. The most crucial step in a Monte Carlo sim- 
ulation is being able to generate distances resembling those of LCDM distances of GM tissue in VMPFCs; i.e., 
capturing the true randomness in LCDM distances. For demonstrative purposes, we pick the left VMPFC of 
HR subject 1. Recall that the LCDM distances for left VMPFC of HR subject 1 were denoted by D\ x . We rear- 
range the distances, D^, so that first stack of distances are in Iq := [—1, 0.5] mm, the second stack of distances 
are in I\ := (0.5, 1.0] mm, the third stack of distances are in I2 := (1.0, 1.5] mm, and so on (until the last stack 
of distances arc in in := (5.5, 6.0] mm). Let z/, ; be the number of distances that fall in I i: i.e., = \D ^ Hi*, 
for i = 0, 1, . . . , 11. Hence v = {v , v u . . . , u n ) = (2059, 1898, 1764, 1670, 1492, 1268, 814, 417, 142, 81, 61, 16). 
Then we merge these stacks into one group, (by appending n to Z?2i H lj for i = 1, 2, 3, ... , 10). 
See Figure [TU] where the top graph is for the merged distances and bottom graph is for distances sorted in 
ascending order. 

A possible Monte Carlo simulation for these distances can be performed as follows. We generate n numbers 
in {0, 1,2,..., 11} proportional to the above frequencies, z/j, with replacement. The number of distances for 
left VMPFC of HR subject 1 is 11659, so we generate n = 10000 such numbers. Then we generate as 
many U(0, 1) numbers for each i £ {0,1,2,..., 11} as i occurs in the generated sample of 10000 numbers, 
and add these uniform numbers to i. Then we divide each distance by 2 to match the range of generated 
distances with [0, 6.0] mm which is the range of D^. More specifically, we independently generate n numbers 
from {0, 1,2,..., 11} with the discrete probability mass function Pjy (Nj = i) = t/;/11659 for i = 1, 2, . . ., 11 
and j = 1,2,.. ., n. Let n ?; be the frequency of i among the n generated numbers from {0, 1,2,..., 11} with 
distribution Pn, for i = 0, 1, 2, . . ., 11. Hence n = Yl\=o n >- Then we generate Uik ~ U (0, 1) for k = 1, 2, . . . , m 
for each i, and the desired distance values are d%k = (i + Uik) /2. Hence the set of simulated distances is 

D mc = {d ik = (i + U^) /2 :U lk ~U (0, 1) for k = 1, , N t and N, ~ P N for i = 0, 1, 2, . . . , 11} . 

A sample of the distances generated in this fashion is plotted in Figure [TTJ where the top plot is for the 
distances as they are generated at each bin (stack) of size 0.5 mm, the bottom plot is for the distances sorted 
in ascending order. Comparing Figures fTOl and fTTl we observe that the Monte Carlo scheme described above 
generates distances that resemble LCDM distances for left VMPFC of HR subject 1. Therefore, the distances 
generated in this fashion together with modification of some parameters such as Vi resemble the distances of 
VMPFCs from real subjects. 

5.2.1 Empirical Size Estimates for the Multi-Sample Case 

For the null hypothesis of multi-sample case which states the equality of the distribution of LCDM distances, 
we generate three samples X, y, and Z each of size n x , n y , and n z , respectively Each sample is generated as 
described above with the sample sizes for bins (stacks) have been selected to be proportional to the frequencies 
P={v ,v 1 ,..., i/n) = (2059, 1898, 1764, 1670, 1469, 1268, 814, 417, 142, 81, 61, 16), i.e., the left VMPFC of HR 
subject 1. This is done without loss of generality since any other VMPFC can cither be obtained by a 
rescaling of the generated distances, or by modifying the frequencies in v. That is, we generate n numbers 



in {0, 1,2,.. ., 11} proportional to the above frequencies, Vi, with replacement. Then we generate as many 
U(0, 1) numbers for each i E {0, 1, 2, . . . , 11} as i occurs in the generated sample of 10000 numbers, and 
add these uniform numbers to i. More specifically, we independently generate n numbers from {0, 1, 2, , 11} 
with the discrete probability mass function P/v (Nj = i) = ^ /l 1659 for i = 0, 1, 2, . . ., 11 and j = 1, . . ., n. 
Let Hi be the frequency of i among the n generated numbers from {0, 1,2,..., 11} with distribution Pn, for 
i = 0, 1, 2, . . ., 11. Then we generate Uik ~ U (0, 1) for k = 1, . . . , rij for each i, and the desired distance values 
are dik = (i + Uik). Hence the set of simulated distances is 

D mc = {d lk = (i + U lk ) /2 : U lk ~ U (0, 1) for k = 1, ,N t and N, ~ P N for i = 0, 1, 2, . . . , 11} . 

We repeat these sample generations N mc = 10000 times. We count the number of times the null hypothesis 
is rejected at a = 0.05 level for B-F test of HOV, K-W test of distributional equality, and ANOVA F-tests 
(with and without HOV) of equality of mean distances, thus obtain the estimated significance levels under 
H a . The estimated significance levels for various values of n x , n y , and n z are provided in Table 1111 where 
ctBF is the empirical size estimate for B-F test, cxkw is for K-W test, ap 1 is for ANOVA F-test with HOV, 
and ap 2 is for ANOVA F-test without HOV; furthermore, aKW,F t is the proportion of agreement between 
K-W and ANOVA F-test with HOV, i.e., the number of times out of 10000 Monte Carlo replicates both K-W 
and ANOVA F-test with HOV reject the null hypothesis. Similarly, o,kw,f^s the proportion of agreement 
between K-W and ANOVA F-test without HOV, and cxf 1 ,f 2 is the proportion of agreement between ANOVA 
F-test with HOV and ANOVA F-test without HOV. Using the asymptotic normality of the proportions, we 
test the equality of the empirical size estimates with 0.05, and compare the empirical sizes pairwise. We 
observe that K-W and B-F tests are both at the desired significance level, while ANOVA F-tests with and 
without HOV are at the desired level or slightly conservative. Notice also that under H a , the tests tend 
to be more conservative as the sample sizes increase. Hence, if the distances are not that different; i.e., the 
frequency of distances for each bin and the distances for each bin are identically distributed for each group, the 
inherent spatial correlation does not seem to influence the significance levels. Moreover, we observe that for 
LCDM distances K-W and Fi tests have significantly different rejection (hence acceptance) regions because, 
the proportion of agreement for these tests, oikw^x is significantly smaller than the minimum of cxkw and 
ap 1 . Similarly, K-W and F 2 tests have significantly different rejection (hence acceptance) regions because, the 
proportion of agreement for these tests, (Xkw.Fi is significantly smaller than the minimum of ukw an d &f 2 - 
However, Fx and F 2 tests have about the same rejection (hence acceptance) regions because, the proportion 
of agreement for these tests, (Xf 1 .f 2 is not significantly different from the minimum of ap 1 and Sp 2 . This 
mainly results from the fact that K-W and Fi tests test different hypotheses, and so do the K-W and F 2 
tests. But, Fi and F 2 tests basically test the same hypotheses. 

5.2.2 Empirical Power Estimates for Multi-Sample Case 

For the alternative hypotheses, we generate sample X as follows. We generate as many U{Q, 1) numbers for 
each i £ {0, 1,2,..., 11} as i occurs in the generated sample of n x numbers, and add these uniform numbers 
to i for sample X. For sample y, we generate numbers in {0,1,2,. . . ,12} with replacement proportional to the 

frequencies v y = (y^,v\ ) where //? is the i value after the entries \v, — r] y \ are sorted in descending 

order for z = 0, 1,2, ... , 11 and v\ 2 = 11659 — JV=o \ Vi ~ 7 h\- Then we generate as many U(0, r y ) numbers for 
each i 6 {0, 1, 2, ... , 12} as i occurs in the generated sample of n y numbers, and add these uniform numbers 
to i. For sample Z, we generate numbers in {0,1,2,. . . ,12} with replacement proportional to the frequencies 
v z = (vq, . . . , vl 2 ) where tf is the i th value after the entries \vi — r\ z \ arc sorted in descending order for 
i = 0, 1, 2, . . . , 11 and v\ 2 = 11659 — ^i=o \ Vi ~ Then we generate as many U(0, r z ) numbers for each 
i G {0, 1,2,..., 12} as i occurs in the generated sample of n z numbers, and add these uniform numbers to i. 

More formally, the samples are generated as 



N x 




"Px,J = 


1, . . 


■,n x }, 


Ny 


= {J~ 


^Py,J = 


Ij • • 


•,%}, 


N z 


= {Jr 


-Pz,J = 


Ij • • 


■,n z }, 



where Px {Xj = i) = v'f /11659 with vf is the i th entry in v; Py (J = i) = v\ /X)i=o v \ with v\ is the i th entry 



in v y \ and Pz (J = i) = ff /^2 i=0 with vf is the i th entry in V z . Let nf be the frequency of i among the 

n x generated numbers from Px, nf be the frequency of i among the n y generated numbers from Py , and 
be the frequency of i among the n z generated numbers from Pz- Then we generate Uk ~ U (0, 1) for k= 
1,. . . for each i. Hence the set of simulated distances for set X is 

X = {(i + Ua,) /2 : U lk ~ U (0, 1) for i = 0, 1, , 11 and k = 1, . . . , N x ) , 

similarly, 

y = {(* + U ik ) /2 :U ik ~U (0, r y ) for i = 0, 1, , 12 and fc = 1, . . . , Ny} , 

and 

Z = {(i + U ik ) /2 : U lk ~ U (0, r z ) for * = 0, 1, . . . , 12 and k = 1, . . . , N z } . 

Note that when r y = r z = 1 and r/ y = r/ z = 0, we obtain the null case of distributional equality between X , y 
and Z. The alternative cases we consider are 

(r„, r z , Vy , Vz ) G {(1.1, 1.0, 0, 0) , (1.1, 1.2, 0, 0) , (1.0, 1.0, 10, 0) , (1.0, 1.0, 10, 10) , (1.0, 1.0, 10, 30)} . 

See Figurc[T2]for the kernel density estimates of sample distances under the null case and various alternatives. 

We repeat these sample generations N mc = 10000 times. We count the number of times the null hypothesis 
is rejected for B-F test of HOV, K-W test of distributional equality, and ANOVA F-tests (with and without 
HOV) of equality of mean distances, thus obtain the empirical power estimates under H a which are provided in 
Table [T2l where (3 BF is the empirical power estimate for B-F test, Pkw is f° r K-W test, (3pi is f° r ANOVA F- 
test with HOV, and (3f 2 is for ANOVA F-test without HOV. Using the asymptotic normality of the empirical 
power estimates, we observe that under H a with (r y ,r z ,i} y , n z ) G {(1.1, 1.0, 0, 0) , (1.1, 1.2, 0, 0)} the variances 
of the distances are not that different, so we still have power estimates for B-F test around .05 (see Figure 
[T21 (left) and Table [T2"|) . But the distributions arc different, so the larger the r y and r z from 1.0, the higher 
the power estimates for K-W and ANOVA F-tests. Furthermore, as the sample size n increases, the power 
estimates for K-W and ANOVA F-tests also increase. Notice that under these alternatives, K-W test tends to 
be more powerful than ANOVA F-tests, since such alternatives influence the ranking (hence the distribution) 
of the distances, more than the mean of the distances. Furthermore, under these alternatives, it is not the 
size or scale that is really different; it is the shape that is more emphasized. This size component is distance 
with respect to the GM/WM surface; i.e., if the GM voxels from the GM/WM surface are at about the same 
distance, K-W test is more sensitive to the differences in LCDM distances. We also note that ANOVA F-tests 
have about the same power estimates. 

Under H a with {r y ,r z ,r] y ,r] z ) G {(1.0, 1.0, 10, 0) , (1.0, 1.0, 10, 10) , (1.0, 1.0, 10, 30)} the variances of the 
distances start to differ (see Figure [T2l (right) and Table [T2")k as rj y ,rj z deviate more from 0, the power 
estimates for B-F test increase, and so do the power estimates of K-W and ANOVA F-tests. Note that 
as n increases, the power estimates also increase under these alternative cases. Under these second type of 
alternatives, ANOVA F-tests tend to be more powerful, since the right skewness (tail) of distances are more 
emphasized, which in turn implies that the differences in the mean distances are emphasized more. Under 
these alternatives, both the size or scale and shape are different. If the GM voxels from the GM/WM surface 
are at different distances, ANOVA F-tests are more sensitive to the differences in LCDM distances. We also 
note that both ANOVA F-tests have about the same power estimates. 

Therefore, based on our Monte Carlo analysis, the spatial correlation between distances has a mild influence 
on our results. That is, the results based on B-F, K-W, and ANOVA F-tests on multiple samples are still 
reliable, although the assumption of within sample independence is violated. Since normality is also violated, 
K-W test has fewer assumption violations than the ANOVA F-tests. However, our Monte Carlo analysis 
suggests that K-W test is more sensitive against the shape differences for GM of VMPFCs with similar 
distances to the GM/WM boundary; on the other hand, ANOVA F-tests are more sensitive against the shape 
differences for GM of VMPFCs with different distances to the boundary. 

5.2.3 Empirical Size Estimates for the Two- Sample Case 

For the null hypothesis for the two-sample case, we generate two samples X and ^each of size n x and n y , 
respectively. Each sample is generated as described above. We repeat the sample generation N mc — 10000 
times. 



We count the number of times the null hypothesis is rejected at a = 0.05 for Lillicfor's test of normality, B- 
F test of HOV, Wilcoxon rank sum test of distributional equality, Welch's t-test of equality of mean distances, 
and K-S test of equality of cdfs, thereby obtain the estimated significance levels. Unlike the multi-sample case, 
for the two-sample case, except for Lilliefor's test there are three types of alternative hypotheses possible: 
two-sided, left, and right-sided alternatives. The estimated significance levels are provided in Table [T51 where 
cxbf is the empirical size estimate for B-F test, aw is for Wilcoxon rank sum test, at is for Welch's i-test, 
otKS is for K-S test. Furthermore, aw,t is the proportion of agreement between Wilcoxon rank sum and 
Welch's t-tcsts, aw,KS is the proportion of agreement between Wilcoxon rank sum and K-S tests, and at t KS 
is the proportion of agreement between Welch's t-test and K-S test. We omit the Lilliefor's test, since by 
construction, our samples are severely non-normal, so normality is rejected for almost all samples generated. 
Observe that under H a , the empirical significance levels are about the desired level for all three types of 
alternatives, although B-F and Wilcoxon tests are slightly liberal, while K-S test is slightly conservative. 
Hence, if the distances are not that different; i.e., the frequency of distances for each bin and the distances 
for each bin are identically distributed for each group, the inherent spatial correlation does not influence the 
significance levels. However, Wilcoxon rank sum, Welch's t-test, and K-S tests test different hypotheses, so 
their acceptance and rejection regions are significantly different for LCDM distances, since the proportion of 
agreement for each pair is significantly smaller than the minimum of the empirical size estimates for each pair 
of tests. 

5.2.4 Empirical Power Estimates for the Two-Sample Case 

For the alternative hypotheses, we generate samples X and y as in Section l5.2.1l also. Note that when r y = 1 
and rjy = 0, we obtain the null case. The alternative cases we consider are (r y , r) v ) G {(1-1, 0), (1.2, 0), (1.0, 10), 
(1.0,30), (1.0,50)}. We count the number of times the null hypothesis is rejected for Lilliefor's test of nor- 
mality, B-F test of HOV, Wilcoxon rank sum test of distributional equality, Welch's t-tcst of equality of 
mean distances, and K-S test of equality of cdfs, thereby obtain the estimated significance levels. The power 
estimates are provided in Tabic [Ml where (3bf is the power estimate for B-F test, @w is for Wilcoxon rank 
sum test, flt is for Welch's t-test, /3ks is for K-S test. 

Under H a with (r y , rj y ) G {(1-1, 0), (1.2, 0)}, the variances of the distances are not that different (see Figure 
IT2l (left) and Table 1X4]) . so we still have power estimates for B-F test around .05. But the distributions start 
to differ; so as r y deviates further away from 1.0, then the power estimates for Wilcoxon rank sum, Welch's 
t-tcst, and K-S tests increase. Furthermore, as the sample size n increases, the power estimates for Wilcoxon, 
Welch's t-test, and K-S tests also increase. Observe that, as in the multi-sample case, under these alternatives, 
Wilcoxon test is more powerful than Welch's t-test, since the ranking of the distances arc affected more than 
the mean distances under these alternatives. But K-S test has the highest power estimates for sample sizes 
larger than 1000. Thus, for differences in shape rather than the distance from the GM/WM surface, K-S 
test and Wilcoxon rank sum test are more sensitive than Welch's t-test. Furthermore, as the sample sizes 
increase, the left-sided tests become more powerful than their two-sided counterparts. Notice that we omit 
the power estimates for the right-sided alternatives. By construction, X values tend to be smaller than y 
values for these alternatives; hence the tests virtually have no power for the right-sided alternatives. 

Under H a with (r y ,i} y ) G {(1-0, 10), (1.0,30), (1.0,50)}, the variances of the distances start to differ (see 
Figure [T2l (right) and Table [T4"]) ; as r/x deviates more from 0, the power estimates for B-F test increase, and 
so do the power estimates of Wilcoxon, Welch's t-test, and K-S tests. Note that as n increases, the power 
estimates also increase under each alternative case. Under these alternatives, t-test is more powerful than 
Wilcoxon test, since mean distances are more affected than the rankings under such alternatives. However, 
K-S test has higher power estimates for larger deviations from the null case. These alternatives imply that the 
distances of the GM voxels are at different scales, Welch's t-test has the best performance for small differences, 
while for large differences, K-S has the best performance. Furthermore, as the sample sizes increase, the left- 
sided tests become more powerful than their two-sided counterparts. Again, we omit the power estimates 
for the right-sided alternatives because by construction, X values tend to be smaller than y values for these 
alternatives. 

We do not report the power estimates for Lilliefor's test of normality, since by construction our data is 
severely non-normal, and we get power estimates of 1.000 under both null and alternative cases. 

Therefore, based on our Monte Carlo analysis, the spatial correlation between distances has a mild in- 



fluence, if any, on our results. That is, the results based on Wilcoxon rank sum test, Welch's t-test, K-S, 
and B-F tests for two samples are still reliable, although the assumption of within sample independence is 
violated. However, Wilcoxon rank sum test is more sensitive against the shape differences of GM of VMPFCs 
with similar distance from the GM/WM boundary; while the Welch's t-tcst is more sensitive against the 
differences of GM tissue with different distances from the boundary. 

6 Discussion and Conclusions 

In this article, we investigate various uses of the LCDM distances to detect differences in morphometry in 
brain tissues due to various factors such as a disease or disorder. As an illustrative example, we use GM tissue 
in Ventral Medial Prefrontal Cortices (VMPFCs) for three groups of subjects; namely, subjects with major 
depressive disorder (MDD), subjects at high risk for depression (HR), and control subjects (Ctrl). Our study 
comprises of (MDD,HR) and (Ctrl, Ctrl) co-twin pairs. Since we focus on the use of LCDM distances, rather 
than the clinical implications of the genetic influence (due to twinness), we ignore the twin influence for most 
of the analysis in this article (except in the comparison of MDD and HR volumes and descriptive measures). 

LCDM distance data set comprises of a large set of distances, which depends on the voxel size which is used 
to partition the tissue (GM of VMPFCs). First, as a preliminary step, we use simple morphometric measures 
based on LCDM distances. These simple measures include, volume (a multiple of the number of LCDM 
distances), and descriptive measures such as median, mode, range, and variance of LCDM distances. The 
location of the distribution (e.g., median) of LCDM distances provide information on the (average) cortical 
mantle thickness, on the other hand, the scale of the distribution (e.g., standard deviation or variance) of the 
distribution of LCDM distances provide information on variation in morphometry (shape and/or size). More 
variation in the distances can be resulting from the higher cortical mantle thickness or more variation in the 
surface structure. In the analysis of these descriptive summary measures, we can both use nonparametric 
or parametric tests, since most of the time the assumptions were met for both types of tests. For example, 
for multi-group comparisons we could apply Kruskal-Wallis (K-W) test or ANOVA F-test. Each of these 
measures conveys information on some aspect of the morphometry of VMPFCs. The analysis of these measures 
might provide a preliminary assessment of differences in morphometry, although by summarizing most of the 
information LCDM distances convey is lost. For example, in our data set volumes and descriptive measures 
do not indicate much separation between groups due to depressive disorders. 

Since the descriptive measures of LCDM distances are summary statistics, they tend to oversimplify 
the data, and hence we lose most of the information conveyed by the LCDM distances. To avoid the loss 
of information when using the descriptive summary statistics from LCDM distance as performed [33], we 
pool LCDM distances of subjects from the same group assuming distances in the same group have the 
similar distributions. As a precautionary step, we find the extreme (outlier) subjects; i.e., the subjects 
whose VMPFCs have much different distributions than the rest. Note that the kernel density estimates (or 
normalized histograms) can be used as an exploratory tool to detect outliers. The pooled LCDM distances 
can be used to detect group differences in morphometry, left-right morphometric asymmetry, and stochastic 
ordering of the distances. Since LCDM is designed to measure cortical mantle thickness (with respect to the 
GM/WM surface), it naturally provides size differences in the normal direction from the surface. However, 
the "width" (i.e., the thickness of VMPFC parallel to the surface) is less relevant, so LCDM tends to ignore 
the size differences in the parallel (to the surface) direction. 

We apply the parametric tests (e.g., ANOVA F-tcst and Welch's t-test) and nonparametric tests (K-W 
test and Wilcoxon rank sum tests) for multi-group and two-group comparisons of LCDM distances. We 
use Brown-Forsythc (B-F) test for homogeneity of the variances (HOV) and Kolmogorov-Smirnov (K-S) test 
for cdf comparisons. But the parametric tests require normality and all of these tests require within sample 
independence. The pooled LCDM distances are extremely non-normal due to heavy right skewness, and within 
sample independence is violated due to the spatial correlation between LCDM distances of neighboring voxels. 
However, our extensive Monte Carlo study reveals that the influence of these violations is very mild if not 
negligible. Applying K-S test on LCDM distances may provide the stochastic ordering of LCDM distances, if 
present. Although, Wilcoxon rank sum test and K-S test have the same null hypothesis, the alternatives and 
information they provide are different. If K-S test fails to reject the null, it means no significant distributional 
differences over the whole range of the variable, while if Wilcoxon test fails to reject the null hypothesis, it 



means that the location parameter or more precisely the ranking of the variables is not significantly different. 
If K-S test rejects the null hypothesis and there is no stochastic ordering, then it means that the direction of 
distributional differences vary at different values of the variable (LCDM distance). 

Left-right morphomctric asymmetry can also be detected by the use of LCDM distances (with Wilcoxon 
rank sum test and f-test). Such asymmetry might be due to both asymmetry in shape and/or size. In terms 
of size asymmetry, LCDM emphasizes mantle thickness asymmetry, rather than the mantle width asymmetry. 

We demonstrate that pooled LCDM distances are a powerful tool to detect various types of morphomctric 
differences. For the illustrative example we used in the article, the analysis on LCDM distances indicate 
that VMPFC left and right distances tend to decrease due to the depressive disorders or being at high risk 
for depression, possibly due to a thinning in left and right VMPFCs; the morphomctric variation reduces in 
left and right VMPFCs due to suffering from or being at high risk for depressive disorders compared to Ctrl 
subjects and is smallest for the HR subjects for both left and right VMPFCs; there is significant left-right 
asymmetry in LCDM distances in the sense that, the cortical mantle in left VMPFC is thinner for MDD and 
Ctrl subjects and thicker for HR subjects compared to their right counterparts. Moreover, the analysis of 
LCDM distances yield a stochastic ordering as HR < ST MDD < 5T Ctrl for both left and right VMPFCs; 
i.e., it is more likely for HR-left distances to be smaller compared to MDD-lcft and Ctrl-left distances, and 
more likely for MDD-lcft distances to be smaller than Ctrl-left distances. That is, it is more likely for left 
VMPFC of HR subjects to be thinner compared to those of MDD subjects, and more likely for VMPFC of 
MDD subjects to be thinner compared to those of Ctrl subjects. The same holds for right VMPFCs. The 
corresponding clinical findings, together with the interpretations, have been described elsewhere [44]. Note 
that, pooled distances are only suggestive of morphomctric differences, but do not provide information on the 
location of these differences. This aspect of LCDM analysis is a topic of ongoing research. 

Observe also that LCDM distances provide information on morphometry (both shape and size (especially 
in the normal direction from the interface, i.e., thickness). One can adjust the distances for size (e.g., volume), 
then LCDM distances will only provide shape information. The size (or scale) adjustment for LCDM distances 
is also a topic of ongoing research. 

Finally, we emphasize that the methodology used in this article for VMPFC shape differences can be used 
for other tissues or organs of humans and animals, as well as distances similar to the LCDM distances. 
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Tables 





volume [mm 6 ) 




left 


right 


group 


n 


mean 


std dev 


mean 


std dev 


MDD 


20 


1680.7 


248.2 


1607.8 


220.0 


HR 


20 


1601.6 


235.6 


1589.8 


239.5 


Ctrl 


28 


1700.6 


295.3 


1676.3 


297.5 


overall 


68 


1665.6 


264.9 


1630.7 


259.2 



Table 1: The sample sizes (n), means, and standard deviations (std dev) of the volumes for left and right 
VMPFCs overall and for each group. 



p- values for pair wise volume (mm 3 ) comparisons 





left 


right 


pair 


Pw 


Pt 


Pw 


Pt 


MDD,HR 


.2145 (g) 


.2400 (g) 


.3990 (g) 


.4068 (<?) 


MDD, Ctrl 


.4794 0) 


.4006 (<?) 


.3990 {£) 


.4068 (£) 


HR,Ctrl 


.2145 {£) 


.2400 (£) 


.3990 {£) 


.4068 (£) 



Table 2: The p- values for the pairwise comparisons of the mean volumes with pairwisc Wilcoxon tests and 
Welch's t-tcsts. pw- p- value based on Wilcoxon rank sum test and pt'- p- value based on Welch's t-test; g (£) 
stands for the greater (less) than alternative. 



p- values for left-right volume (mm 3 ) asymmetry 





overall 


MDD 


HR 


Ctrl 


Pw 


.0064* (g) 


.0360* (g) 


.2545 (g) 


.2545 (g) 


Pt 


.0087* (g) 


.0233* (g) 


.3376 (g) 


.2436 (g) 



Table 3: The p- values for the tests of left-right volume asymmetry by Wilcoxon signed rank test, pw- p- value 
based on Wilcoxon signed rank test and pt: p-value based on Welch's i-test; g (£) stands for the greater (less) 
than alternative. Significant p- values at a = 0.05 are marked with an *. 



correlation coefficients 





overall (L,R) 


MDD (L,R) 


HR (L,R) 


Ctrl (L,R) 


MDDRHRL 


MDDR,HRR 


PS 


.8882 


.8120 


.8556 


.9425 


.4120 


.3158 


p 


< .0001* 


< .0001* 


< .0001* 


< .0001* 


.0359* 


.0868 



Table 4: The Spearman correlation coefficients (denoted by ps) between left and right VMPFC volumes 
and the associated p-values for the alternative that correlation coefficient is non-zero. Significant p-values at 
a = 0.05 are marked with an *. 



p- values for cdf comparisons 





volume (mm a ) 


pair 


left 


right 


MDD, Ctrl 


.2735 {£) 


.1489 (g) 


HR,Ctrl 


.1792 (g) 


.1489 (g) 



Table 5: The p-values based on K-S test for the cdf comparisons (overall and by group) of the volumes, g 
(£) stands for the greater (less) than alternative. 





Left VMPFC 


Right VMPFC 


Group 


n 


mean 


median 


std dev 


n 


mean 


median 


std dev 


MDD 


238937 


1.62 


1.46 


1.13 


170534 


1.63 


1.49 


1.10 


HR 


228224 


1.61 


1.46 


1.11 


216978 


1.59 


1.46 


1.08 


Ctrl 


308498 


1.66 


1.50 


1.14 


293479 


1.66 


1.53 


1.12 


Overall 


775659 


1.63 


1.48 


1.13 


680991 


1.63 


1.50 


1.10 



Table 6: The sample sizes (n), means, medians, and standard deviations (std dev) of the pooled LCDM 
distances (in mm) for left and right VMPFCs overall and for each group (after extreme subjects are removed). 



With i-tcst 




all subjects included 


extreme subjects removed 


Pair 


Left 


Right 


Left 


Right 


MDD, HR 


.0383* (g) 


.0041* (g) 


< .0001* (g) 


< .0001* (g) 


MDD, Ctrl 


< .0001* (£) 


< .0001* (£) 


< .0001* {£) 


< .0001* (£) 


HR, Ctrl 


< .0001* {£) 


< .0001* {£) 


< .0001* {£) 


< .0001* {£) 


With Wilcoxon rank sum test 




all subjects included 


extreme subjects removed 


Pair 


Left 


Right 


Left 


Right 


MDD, HR 


.3022 (£) 


.0776 (g) 


.0084* (g) 


< .0001* (g) 


MDD, Ctrl 


< .0001* {£) 


< .0001* (£) 


< .0001* {£) 


< .0001* (£) 


HR, Ctrl 


< .0001* (£) 


< .0001* (£) 


< .0001* (£) 


< .0001* (£) 



Table 7: The p-values for the simultaneous pairwise comparisons of the pooled distances by Welch's t and 
Wilcoxon rank sum tests. The p- values are adjusted by Holm's correction method, g (£) stands for the greater 
(less) than alternative. Significant p- values at a = 0.05 are marked with an *. 





with all subjects included 


extreme subjects removed 


Pair 


Left 


Right 


Left 


Right 


MDD, HR 


< .0001* (g) 


< .0001* (g) 


< .0001* (g) 


< .0001* (g) 


MDD, Ctrl 


< .0001* {£) 


< .0001* (£) 


< .0001* (£) 


< .0001* {£) 


HR, Ctrl 


< .0001* {£) 


< .0001* {£) 


< .0001* {£) 


< .0001* (£) 



Table 8: The p-values for the simultaneous pairwise comparisons of the variances of distances by B-F 
HOV test. The p-values are adjusted by Holm's correction method, g (£) stands for the greater (less) than 
alternative. Significant p- values at a = 0.05 are marked with an *. 



With i-test 




overall 


MDD 


HR 


Ctrl 


all subjects 


< .0001* (g) 


< .0001* (g) 


< .0001* {g) 


< .0001* (g) 


outliers removed 


.1439 (g) 


.0227* {£) 


< .0001* (g) 


.0681* {£) 


With Wilcoxon test 


all subjects 


< .0001* (g) 


< .0001* (g) 


< .0001* (g) 


< .0001* (g) 


outliers removed 


< .0001* {£) 


< .0001*(l) 


.0015* (g) 


< .0001* (£) 



Table 9: The p-values for the tests of left-right distance asymmetry by Welch's t and Wilcoxon rank sum 
tests. The p-valucs for groups are adjusted by Holm's correction method, g (£) stands for the greater (less) 
than alternative. Significant p- values at a = 0.05 are marked with an *. 





p— values for cdf comparisons when all subjects included 




Left 


Right 


Pair 


2-sidcd 






2-sidcd 


I s * <C 2 nc ^ 


^st ynd 


MDD, HR 


< .0001* 


< .0001* 


.0073* 


.0316* 


.0158* 


.6017 


MDD, Ctrl 


< .0001* 


.8340 


< .0001* 


< .0001* 


.0138* 


< .0001* 


HR, Ctrl 


< .0001* 


.8340 


< .0001* 


< .0001* 


.0129* 


< .0001* 




p— values for cdf comparisons when extreme subjects removed 




Left 


Right 


Pair 


2-sided 






2-sided 


^st ^ 2 nc ^ 




MDD, HR 


< .0001* 


< .0001* 


.7519 


< .0001* 


< .0001* 


.9585 


MDD, Ctrl 


< .0001* 


.9544 


< .0001* 


< .0001* 


1.000 


< .0001* 


HR, Ctrl 


< .0001* 


.9544 


< .0001* 


< .0001* 


1.000 


< .0001* 



Table 10: The p- values based on K-S test for the cdf comparisons (overall and by group) of the pooled LCDM 
distances. The p-values for each type of alternative are adjusted by Holm's correction method. Significant 
p- values at a = 0.05 are marked with an *. 





Empirical size 


Prop, of agreement 


(n x ,n y ,n z ) 




Q-KW 






aKW,F! 


(5kw,f 2 


aF 1: F 2 


(1000,1000,1000) 


.0508 a 


.0511 a 


.0508 a 


.0506 a 


.0417' 


.0419' 


.0499~ 


(5000,5000,10000) 


.0516 a 


.0495° 


.0498° 


.0497 a 


.0386' 


.0386' 


.0491~ 


(5000,7500,10000) 


.0499 a 


.0480° 


.0451 a -< 


.0449 a '< 


.0368' 


.0369' 


.0446~ 


(10000, 10000, 10000) 


.0485 a 


.0483° 


.0483 a 


.0480 a 


.0392' 


.0392' 


.0477- 



Table 1 1 : Estimated significance levels and proportion of agreement between the tests based on Monte Carlo 
simulation of distances with three groups, X, y, and Z each with size n x , n y , and n z , respectively, with 
N m c = 10000 Monte Carlo replicates, olbf is for empirical size estimate for B-F test, (xkw is for K-W 
test, ctF 1 and Sf 2 are for ANOVA F-tests with and without HOV, respectively; olkw^ is the proportion 
of agreement between K-W and ANOVA F-tcst with HOV, axw,F 2 i s the proportion of agreement between 
K-W and ANOVA F-tcst without HOV, and Sf 1 .f 2 is the proportion of agreement between ANOVA F-tests 
with and without HOV. The empirical sizes with the same superscript are not significantly different from each 
other. > :Empirical size is significantly larger than 0.05; i.e. method is liberal. < :Empirical size is significantly 
smaller than 0.05; i.e. method is conservative. ':The proportion of agreement significantly less than the 
minimum of the empirical sizes. ~:The proportion of agreement not significantly less than the minimum of 
the empirical sizes. 



(r y ,r z ) =(1.1,1.0); ( % ,%)=(0,0) 


(n x ,n v ,n z ) 


pBF 


Pkw 


f3 Fl 




(1000,1000,1000) 


.0511 


.0778 


.0770 


.0768 


(5000,5000,10000) 


.0511 


.2281 


.2137 


.2114 


(5000,10000,5000) 


.0512 


.2936 


.2731 


.2745 


(5000,10000,7500) 


.0508 


.3244 


.2939 


.2947 


(10000,10000,10000) 


.0482 


.3900 


.3564 


.3559 


(r y ,r z ) =(1.1,1.2); (%,rfe)=(0,0) 


(1000,1000,1000) 


.0516 


.1396 


.1316 


.1313 


(5000,5000,10000) 


.0519 


.6725 


.6315 


.6317 


(10000,5000,5000) 


.0503 


.6651 


.6262 


.6253 


(5000,10000,5000) 


.0516 


.5296 


.4828 


.4828 


(10000,10000,10000) 


.0490 


.8410 


.8050 


.8050 


(r„,r,) =(1.0,1.0); ( % , ^)=(10,0) 


(1000,1000,1000) 


.0899 


.0574 


.0728 


.0721 


(5000,5000,10000) 


.3408 


.0767 


.1930 


.1854 


(5000,10000,5000) 


.4275 


.0884 


.2341 


.2381 


(5000,7500,10000) 


.4378 


.0832 


.2415 


.2360 


(5000,10000,7500) 


.4713 


.0878 


.2571 


.2584 


(10000,10000,10000) 


.5564 


.1006 


.3127 


.3061 


(r y ,r z ) =(1.0,1-0); (%, ^)=(10,30) 


(1000,1000,1000) 


.2236 


.0963 


.1519 


.1512 


(5000,5000,10000) 


.9255 


.3986 


.7436 


.7537 


(10000,5000,5000) 


.9186 


.3556 


.7175 


.7071 


(5000,10000,5000) 


.8083 


.2908 


.5826 


.5831 


(5000,7500,10000) 


.9214 


.4191 


.7578 


.7627 


(10000,7500,5000) 


.9144 


.3652 


.7229 


.7147 


(10000,5000,7500) 


.9643 


.4554 


.8260 


.8226 


(7500,5000,10000) 


.9644 


.4739 


.8331 


.8363 


(7500,10000,5000) 


.8765 


.3421 


.6743 


.6702 


(5000,10000,7500) 


.8811 


.3752 


.6938 


.6983 


(10000,10000,10000) 


.9851 


.5352 


.8842 


.8835 



Tabic 12: The power estimates based on Monte Carlo simulation of distances with three groups, X, y, and 
Z each with size n x , n y , and n 2 , respectively, with N mc = 10000 Monte Carlo replicates. For the parameters 
r y, r z , rjy, and r] z , see Section [5.2.21 (3bf is the empirical power estimate for B-F test, (5kw is for K-W test, 
(3f 1 and (3p 2 are for ANOVA F-tcsts with and without HOV, respectively. 



Two-Sidcd Tests 




Empirical size 


Prop, of agreement 


(n x ,n y ) 


CtBF 


aw 


a t 


aKS 


awj 


aw,KS 


U t ,KS 


(1000,1000) 


.0514 


.0517 


.0505 


.0486 


.0403' 


.0305' 


.0273' 


(5000,10000) 


.0533 


.0457< 


.0463< 


.0465 


.0356' 


.0273' 


.0244' 


(7500,10000) 


.0486 


.0493 


.0463< 


.0464 


.0385' 


.0282' 


.0246' 


(10000, 10000) 


.0525 


.0518 


.0525 


.0501 


.0421' 


.0320' 


.0281' 


Left-Sided Tests (i.e., X values tend to be smaller than y values) 




Empirical size 


Prop, of agreement 


(1000,1000) 


.0503 


.0517 


.0527 


.0486 


.0440' 


.0329' 


.0305' 


(5000,10000) 


.0512 


.0470 


.0489 


.0492 


.0382' 


.0311' 


.0282' 


(7500,10000) 


.0521 


.0490 


.0493 


.0478 


.0399' 


.0322' 


.0284' 


(10000, 10000) 


.0489 


.0517 


.0514 


.0494 


.0426' 


.0330' 


.0301' 


Right-Sided Tests (i.e., X values tend to be larger than y values) 




Empirical size 


Prop, of agreement 


(1000,1000) 


.0514 


.0521 


.0502 


.0491 


.0409' 


.0337' 


.0294' 


(5000,10000) 


.0485 


.0486 


.0502 


.0478 


.0405' 


.0308' 


.0285' 


(7500,10000) 


.0493 


.0479 


.0469 


.0495 


.0391' 


.0325' 


.0287' 


(10000, 10000) 


.0549> 


.0532 


.0517 


.0469 


.0435' 


.0354' 


.0311' 



Table 13: Estimated significance levels based on Monte Carlo simulation of distances with two groups 
X and y each with size n x and n z , respectively, with N mc = 10000 Monte Carlo replicates. Sbf is the 
empirical size estimate for B-F test, aw is for Wilcoxon rank sum test, at is for Welch's t-test, cxks is for 
K-S test; a~w,t is the proportion of agreement between Wilcoxon rank sum and Welch's t-tests, aw,KS is the 
proportion of agreement between Wilcoxon rank sum and K-S tests, and &t,KS is the proportion of agreement 
between Welch's t-test and K-S test. > :Empirical size is significantly larger than 0.05; i.e. method is liberal. 
< :Empirical size is significantly smaller than 0.05; i.e. method is conservative. ':Thc proportion of agreement 
significantly less than the minimum of the empirical sizes. ~:The proportion of agreement not significantly 
less than the minimum of the empirical sizes. 



Ty = 1.1; T]y = 




Two-Sided 


Left-Sided 




Pbf 


flw 


A 


Pks 


Pbf 


Pw 


Pt 


(3ks 


(1000,1000) 


.0517 


.1317 


.1264 


.0788 


.0524 


.0742 


.0712 


.0750 


(5000,10000) 


.0529 


.2723 


.2520 


.3734 


.0568 


.3816 


.3600 


.5122 


(10000,5000) 


.0471 


.2720 


.2507 


.3753 


.0522 


.3838 


.3572 


.5157 


(7500,10000) 


.0491 


.3242 


.3046 


.4731 


.0570 


.4425 


.4178 


.6139 


(10000,7500) 


.0515 


.3305 


.3100 


.4850 


.0551 


.4455 


.4204 


.6253 


(10000,10000) 


.0498 


.3662 


.3362 


.5504 


.0521 


.4924 


.4588 


.6861 


r y = 1.2; % =0 


(1000,1000) 


.0511 


.2635 


.2533 


.1838 


.0527 


.1695 


.1630 


.1813 


(5000,10000) 


.0547 


.7606 


.7331 


.9401 


.0663 


.8463 


.8250 


.9755 


(10000,5000) 


.0521 


.7588 


.7269 


.9421 


.0627 


.8437 


.8178 


.9765 


(7500,10000) 


.0526 


.8514 


.8282 


.9839 


.0666 


.9121 


.8950 


.9945 


(10000,7500) 


.0545 


.8561 


.8300 


.9845 


.0672 


.9133 


.8969 


.8882 


(10000,10000) 


.0512 


.8976 


.8750 


.9935 


.0624 


.9468 


.9312 


.9982 


r v = 1.0; % = 10 


(1000,1000) 


.0965 


.0772 


.1173 


.0514 


.1009 


.0506 


.0677 


.0477 


(5000,10000) 


.3885 


.0871 


.2222 


.0673 


.5111 


.1361 


.3297 


.1089 


(10000,5000) 


.3687 


.0841 


.2186 


.0670 


.4941 


.1390 


.3232 


.1076 


(7500,10000) 


.4767 


.0951 


.2638 


.0737 


.5964 


.1497 


.3786 


.1159 


(10000,7500) 


.4647 


.0995 


.2630 


.0748 


.5927 


.1560 


.3725 


.1161 


(10000,10000) 


.5253 


.1018 


.2978 


.0743 


.6505 


.1628 


.4132 


.1200 


r y = 1.0; % = 30 


(1000,1000) 


.2858 


.1760 


.2887 


.0878 


.2905 


.1028 


.1885 


.0793 


(5000,10000) 


.9528 


.4677 


.8254 


.7080 


.9769 


.5927 


.8881 


.8911 


(10000,5000) 


.9452 


.4668 


.8094 


.6901 


.9707 


.5918 


.8807 


.8659 


(7500,10000) 


.9859 


.5578 


.8987 


.9078 


.9938 


.6773 


.9435 


.9792 


(10000,7500) 


.9837 


.5509 


.8976 


.8983 


.9919 


.6750 


.9438 


.9713 


(10000,10000) 


.9932 


.6188 


.9369 


.9691 


.9971 


.7339 


.9679 


.9942 


Ty = 1.0; T]y = 50 


(1000,1000) 


.4724 


.3361 


.4865 


.2041 


.4789 


.2266 


.3521 


.2048 


(5000,10000) 


.9989 


.8876 


.9842 


.9980 


.9998 


.9363 


.9936 


.9998 


(10000,5000) 


.9981 


.8830 


.9844 


.9980 


.9996 


.9325 


.9931 


1.000 


(7500,10000) 


.9999 


.9478 


.9964 


1.000 


.9732 


.9932 


.9986 


1.000 


(10000,7500) 


.9998 


.9473 


.9961 


1.000 


1.000 


.9741 


.9984 


1.000 


(10000,10000) 


.9999 


.9716 


.9984 


1.000 


1.000 


.9847 


.9995 


1.000 



Table 14: The power estimates based on Monte Carlo simulation of distances with two groups, X, and y, 
each with size n x and n y , respectively, with N mc = 10000 Monte Carlo replicates. For the parameters r y and 
r) y , see Section [5.2.41 /3bf is the power estimate for B-F test, (3w is for Wilcoxon rank sum test, (3 t is for 
Welch's f-test, and @ks is for K-S test. 
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Figure 1: A schematic view of flowchart of LCDM measurement procedure. 




Figure 2: A two-dimensional illustration of normal distances from a GM and a WM voxel to the GM/WM 
interface (left) and non-normalized histograms of LCDM distances of GM, WM, and CSF tissues. 




Figure 3: The location of VMPFC in the brain. 




Figure 4: Depicted are the scatter plots of the LCDM distances for the left and right VMPFCs by subject 
and color-coded for group. The horizontal lines are located at -0.5 and 5.5 mm. 
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MDD HR Ctrl MDD HR Ctrl 



Figure 5: Depicted are the (slightly jittered) scatter plots of the volumes of the left and right VMPFCs. The 
crosses, x , are located at the mean volume value for each group. 




Figure 6: Depicted are the plots of the kernel density estimates of the LCDM distances for the left and ri: 
VMPFCs by subject. 
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Figure 7: Depicted are the plots of the kernel density estimates of the pooled LCDM distances by group 
when extreme subjects are removed for the left and right VMPFC. 
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Right VMPFC (by subject) 
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Figure 8: Depicted are the plots of the empirical cdfs of the LCDM distances of left and right VMPFCs by 
subject when extreme subjects are removed for the left and right VMPFC (color-coded for group) . 
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Figure 9: Depicted are the plots of the empirical cdfs of the pooled LCDM distances when extreme subjects 
are removed for the left and right VMPFCs. 
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Figure 10: Depicted are the plots of the LCDM distances for the left VMPFC of HR subject 1. The left plot 
is the distances stacked for intervals of size 0.5 mm and the right plot is for the sorted distances. 
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Figure 1 1 : Depicted are the plots of the data values generated by Monte Carlo simulation to resemble LCDM 
distances. The left plot is the distances stacked for intervals of size 0.5 and the right plot is for the sorted 
distances. 




Figure 12: Depicted are the plots of the kernel density estimates of the Monte Carlo simulated LCDM 
distances under the null case and alternatives with 77^=0 and r y <E {1.1, 1.2} (left); null case and alternatives 
with r y = 1.0 and r\ z £ {10, 30, 50}. For the parameters r y , r z , f]y, and r] z , see Section f5. 2.21 



